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DNA methylation is an essential epigenetic mark that is required for normal development. Knockout of the DNA methyl- 
transferase enzymes in the mouse hematopoietic compartment reveals that methylation is critical for hematopoietic differ- 
entiation. To better understand the role of DNA methylation in hematopoiesis, we characterized genome-wide DNA 
methylation in primary mouse hematopoietic stem cells (HSCs), common myeloid progenitors (CMPs), and erythroblasts 
(ERYs). Methyl binding domain protein 2 (MBD) enrichment of DNA followed by massively parallel sequencing (MBD-seq) 
was used to map genome-wide DNA methylation. Globally, DNA methylation was most abundant in HSCs, with a 40% 
reduction in CMPs, and a 67% reduction in ERYs. Only 3% of peaks arise during differentiation, demonstrating a genome- 
wide decline in DNA methylation during erythroid development. Analysis of genomic features revealed that 98% of pro- 
moter CpG islands are hypomethylated, while 20%-25% of non-promoter CpG islands are methylated. Proximal promoter 
sequences of expressed genes are hypomethylated in all cell types, while gene body methylation positively correlates with 
gene expression in HSCs and CMPs. Elevated genome-wide DNA methylation in HSCs and the positive association between 
methylation and gene expression demonstrates that DNA methylation is a mark of cellular plasticity in HSCs. Using de novo 
motif discovery, we identified overrepresented transcription factor consensus binding motifs in methylated sequences. Motifs 
for several ETS transcription factors, including GABPA and ELF1, are overrepresented in methylated regions. Our genome- 
wide survey demonstrates that DNA methylation is markedly altered during myeloid differentiation and identifies critical 
regions of the genome and transcription factor programs that contribute to hematopoiesis. 



[Supplemental material is available for this article.] 

Epigenetic mechanisms of gene regulation are heritable, reversible 
modifications that are critical for the organization of chromatin 
and regulation of tissue-specific gene expression. DNA methyla- 
tion is a dynamic epigenetic mark primarily localized to cytosine 
residues in the context of a CpG dinucleotide in mammals. Tar- 
geted disruption of the genes responsible for de novo methylation 
and maintenance of DNA methylation during replication dem- 
onstrate that DNA methylation is essential for proper development 
in the mouse (Laget et al. 2010; Ma et al. 2010). While the critical 
role for DNA methylation in early development is clearly estab- 
lished, the role for DNA methylation in tissue specification is less 
understood. 

DNA methylation has long been recognized as an important 
mark in establishing and maintaining imprinted gene expression 
and X-chromosome inactivation. Apart from these specialized 
roles for DNA methylation, little is known about how DNA 
methylation leads to more general alterations in gene expression. 
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Methyl-binding domain proteins are a family of DNA-binding 
proteins that recognize methylated DNA and modify gene ex- 
pression by forming complexes with other regulatory proteins. 
Studies of mouse knockout models of the MBD proteins demon- 
strate unique but nonessential roles for most of these proteins 
(Bogdanovic and Veenstra 2009). Of the MBD proteins, MBD2 
appears to play an important role in hematopoiesis, with specific 
roles in globin gene switching (Rupon et al. 2006). 

The hematopoietic system is ideal for the study of methyla- 
tion during differentiation because primary cells at specific stages 
can be separated from other hematopoietic cells by flow cytometry. 
The hematopoietic stem cell (HSC) gives rise to all cells in the pe- 
ripheral blood. The common myeloid progenitor (CMP) generates 
only myeloid cells (red cells, platelets, granulocytes, monocytes, 
and eosinophils), but not lymphoid cells (T- and B-lymphocytes). 
Erythroblasts (ERYs) are nucleated red blood cells that have com- 
mitted to terminal differentiation. Conditional knockout mice in 
which the genes for the de novo methyltransferases Dnmt3a and 
Dnmt3b are deleted in HSCs retain the ability to differentiate into 
both myeloid and lymphoid lineages, but long-term repopulation 
of the hematopoietic system is impaired (Tadokoro et al. 2007). 
Serial transplantation of Dnmt3a deficient HSCs revealed impaired 
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differentiation as well as impaired repopulation (Challen et al. 
2012). Similarly, conditional knockout mice in which the gene 
for the maintenance DNA methyltransferase Dnmtl is deleted in 
HSCs demonstrated severe impairment of repopulating ability 
and inappropriate enhancement of mature myeloid lineages 
(Broske et al. 2009; Trowbridge et al. 2009). Together these 
studies demonstrate a profound role for DNA methylation in 
hematopoiesis. 

While the importance of DNA methylation in hematopoietic 
differentiation has been well established, the genome-wide local- 
ization of methylated DNA at specific stages of myeloid differen- 
tiation remains to be elucidated. Recent advances in sequencing 
technology have allowed comprehensive surveys of DNA meth- 
ylation with varying degrees of resolution. The highest-resolution 
techniques use bisulfite sequencing approaches that have the ad- 
vantage of single-base resolution but do not distinguish between 
5-methylcytosine and 5-hydroxymethylcytosine (Kriaucionis and 
Heintz 2009; Tahiliani et al. 2009). In this study we used a recombi- 
nant methyl-binding domain protein to enrich 5-methylcytosine 
modified regions of the genome for massively parallel sequence 
analysis (MBD-seq). Using this approach, we compared genome- 
wide methylation in purified populations of murine HSCs, CMPs, 
and ERYs. By focusing on methylation changes defined by peaks 
as opposed to site-specific methylation, we were able to identify 
discrete regions of the genome where dynamic methylation 
changes occur during hematopoiesis. Our study reveals that the 
greatest number of methylation peaks occurs in HSCs and that 
these peaks are specifically and successively lost during myeloid 
differentiation, consistent with the bias in myeloid lineages seen 
in Dnmtl knockout mice (Broske et al. 2009; Trowbridge et al. 
2009). The identification of regions where methylation changes 
during hematopoiesis will facilitate mechanistic studies of how 
DNA methylation regulates hematopoietic differentiation. 

Results 

Genome-wide DNA methylation declines during myeloid 
differentiation 

Whole-genome sequencing in human embryonic stem cells has 
demonstrated that as much as 80% of all CpG dinucleotides in the 
genome are methylated (Laurent et al. 2010), yet it is unlikely that 
all of these sites are relevant to differentiation. MBD-seq identifies 
regions of the genome bound by a DNA methylation-binding 
protein, highlighting loci containing multiple methylated CpG 
sites. Recombinant MBD2 pull-down of methylated genomic DNA 
combined with massively parallel sequencing was used to identify 
methylated genomic loci in enriched populations of lineage neg- 
ative Scal + c-kit + cells (a population of cells enriched for long- and 
short-term HSCs and multipotent progenitors, which we have 
designated as HSCs), common myeloid progenitors (CMPs), and 
erythroblasts (ERYs) (Fig. 1A). Two biological replicates were in- 
cluded for each of the three cell types with the total number of 
reads ranging from 32 to 46 million. A complete summary of the 
sequencing reads is provided in Supplemental Table SI. Non- 
enriched genomic DNA was sequenced to determine a standard 
background for subsequent analysis. Supplemental Figure S2 il- 
lustrates the sequencing results for a region containing the im- 
printing control region of the imprinted gene Snrpn. Consistent 
with the observation that Snrpn is imprinted within all cells, 
specific peaks of DNA methylation were detected in all three cell 
types. 
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Figure 1. Overview of MBD sequencing results. (A) Schematic repre- 
sentation of hematopoiesis highlighting the three cell types enriched for 
methylation analysis. HSC and CMP cell populations were lineage de- 
pleted (Lin") and then positively selected with the cell surface markers 
Seal and c-kit. Erythroblast cells were positively selected with antibodies 
forTerl 1 9 and CD71 . (B) Total number of methylation peaks called in each 
of the three cell populations. (C) Total methylation peak count per chro- 
mosome for each cell population: HSC (blue); CMP (orange); ERY (red). 



To quantify genome-wide DNA methylation levels, mapped 
sequencing reads were analyzed to determine statistically signifi- 
cant peaks of methylation. In a conservative filtering approach, 
MBD-seq peaks were called if they were present in both biological 
replicates and overlapped by at least 200 bp. The average distri- 
bution of peak lengths was also investigated, with the average 
peaks occupying slightly more than 800 bp in each cell type 
(Supplemental Table S2). Comparison of overall peak count from 
each cell type revealed that DNA methylation peaks were most 
abundant in HSCs (85,797), with decreasing levels in CMPs 
(50,638) and fewer in ERYs (27,839) (Fig. IB). The decrease in 
methylation reflects a global genomic decrease that is demon- 
strated by the similar relative distribution of methylation peaks on 
each chromosome between cell types despite the reduction in the 
total number of methylation peaks (Fig. 1C). Among the auto- 
somes, chromosome 5 has the greatest number of DNA methyla- 
tion peaks in all cell types, while the fewest number of peaks was 
found on chromosome 19. The X chromosome is subject to ran- 
dom X-chromosome inactivation in females, a process associated 
with DNA methylation. While all animals included in the HSC and 
CMP data sets were female, the fetal tissues used for ERY also 
included males resulting in a slightly lower density of peaks on the 
X chromosome in ERYs. 

Both in silico validation of CpG content within peaks as well 
as bisulfite sequencing of selected peaks were performed to validate 
that the MBD-seq approach specifically recognizes methylated 
DNA. The distribution of peak length is shown in Supplemental 
Table S2, and the distribution of CpG content is shown in Table S3. 
At least 99.96% of all peaks called contained one CpG site with the 
average CpG count ranging from 21 to 24 CpGs per peak (Sup- 
plemental Table S3). Random sampling of sequences of similar 
length revealed that the MBD-enrichment approach results in a 
statistically significant higher percentage of peaks with CpG sites 
than is expected by chance (P- value = 0.00507) and with a signifi- 
cantly higher average CG content (P- value = 0.00391). Bisulfite 
sequencing of 10 genomic loci containing MBD-seq peaks (five com- 
mon to all three data sets, four peaks unique to HSCs or progenitors 
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[HSCs + CMPs], and one peak unique to ERYs) were found to have 
at least 47% methylation at the CpG sites per region investigated 
with an average of —80% methylation (Supplemental Fig. S5; 
Supplemental Table S4). Bisulfite sequencing at genomic regions 
that were not identified as statistically significant peaks revealed an 
average of 30.7% methylated CpG sites, consistent with the ob- 
servation that CpG methylation is relatively common throughout 
the genome. The presence of low-to-intermediate levels of meth- 
ylation in regions excluded from peaks confirms that the MBD-seq 
approach identifies regions of the genome with high-density 
methylation. An additional validation of our MBD-seq data set was 
achieved by comparing our methylation peaks to the recently 
published (Shearstone et al. 2011) reduced representation bisulfite 
sequencing (RRBS) data from mouse erythroblasts. Supplemental 
Figure S6 shows the degree of overlap between the erythroblast 
MBD-seq peaks and the RRBS data. Consistent with the conven- 
tional bisulfite sequencing validation, the overlap between the two 
data sets increases dramatically when the RRBS values exceed 30%- 
40% methylation (Supplemental Fig. S6). 

Because global DNA methylation peaks decrease with mye- 
loid differentiation, we determined whether methylation peaks 
were shared among the cell types or whether the peaks were 
unique to each of the three cell types. Seven categories of DNA 
methylation peaks based on presence in one or more cell types are 
shown in Figure 2. The largest category of DNA methylation peaks, 
composing nearly 40% of all peaks, were unique to HSCs. Meth- 
ylation peaks common to all three cell types were the next largest 
category, representing 28% of all methylation peaks. The third 
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Figure 2. Analysis of peak sharing between cell populations. (A) Venn 
diagram illustrating the degree of overlap in methylation peaks between 
cell populations. (B) Total peak count for each cell population, the number 
of peaks shared between each cell type, and the percentage that each 
category contributes to the total number. Progenitor peaks are defined as 
present in HSCs and CMPs but absent in ERYs. Myeloid peaks are absent in 
HSCs but present in CMPs and ERYs. (Other) Peaks that were absent in the 
CMP but present in HSCs and ERYs. 



largest category was peaks present in HSCs and CMPs, but absent in 
ERYs (progenitor —27%). Approximately 3% of the total methyl- 
ation peaks were unique to either ERYs or CMPs, and an even 
smaller fraction of peaks (0.3%) were absent in HSCs but present in 
the more differentiated cell types (myeloid). Figure 3 A demon- 
strates examples of HSCs and progenitor-specific methylation 
peaks upstream of the important hematopoietic transcription 
factor gene Gata2. An example of an ERY-specific peak in the Meisl 
locus is shown in Figure 3B. Overall, the distribution of peaks 
within each of the cell types demonstrates a progressive global de- 
crease in methylation during differentiation with little acquisition of 
new methylation peaks during the later stages of differentiation. 

DNA methylation is overrepresented in the coding portion 
of the genome 

We next investigated the genomic distribution of methylation 
peaks. The genome was divided into five partitions defined by 
RefSeq genes. The proximal promoter was defined as sequences 
1 kb upstream of the transcription start site (TSS) through the first 
50 bp past the transcriptional start site. Gene body methylation 
was defined by the RefSeq gene coordinates minus the first 50 bp 
past the TSS. Additionally, we looked at the 10-kb regions upstream 
of and downstream from gene boundaries because we expected 
these flanking sequences to also contain cis regulatory elements. 
The remainder of the genome constitutes the intergenic partition. 
The distribution of methylation peaks in each cell-type category 
was compared with the relative percentage of sequence in each 
partition to the total nonrepetitive portion of the genome (geno- 
mic average). Significant deviation from the expected distribution 
was observed for common peaks and for peaks in all multipotent 
categories (Table 1). An overrepresentation of methylation peaks 
was seen in the RefSeq portion of the genome, and methylation 
peaks were underrepresented in intergenic sequences. Methylation 
peaks were overrepresented in the downstream flanking regions 
and were similar to the expected percentage in both the distal 
promoter and 5 '-flanking regions in all peak categories. 

We next investigated the distribution of peaks within the RefSeq 
portion of the genome. The RefSeq sequences were divided into four 
categories: 5 '-untranslated sequences (8.7%), 3 '-untranslated se- 
quences (6.8%), exons (5.3%), and introns (79.2%). Although exons 
represent —5% of the gene sequence, methylation peaks were sig- 
nificantly overrepresented in the exons (7%-25% of all the genie 
DNA methylation peaks) (Table 1) and underrepresented in the in- 
trons in most categories. Since the number of potential methylation 
sites is variable within genie regions, we next determined the relative 
contribution of CpG dinucleotides in each partition and compared 
this with peak distributions. Table 2 demonstrates that MBD-seq 
peak density does not follow CpG distribution, because methylation 
peaks exceed the relative contribution of CG in both exons and in- 
trons, while methylation peaks are found at less than the expected 
frequency in both the 5' and 3' UTR compartments (Table 2). 

CpG island methylation is rare in hematopoietic stem 
cell-specific peaks 

Previous studies have shown that CpG islands in characterized 
promoters of genes remain unmethylated in most instances, while 
CpG islands not associated with known promoters, orphan CpG 
islands, are much more likely to be methylated (Illingworth et al. 
2010). Consistent with a previous report of methylated CpG islands 
in whole mouse blood (Illingworth et al. 2010), we found —2% of 
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Figure 3. MBD-seq peaks at selected loci. UCSC Genome Browser view of the Gata2 locus on Chr6 (A) and the Meisl locus on Chrll (£). Raw 
sequencing data for each cell population: blue (HSC), orange (CMP), and red (ERY), with significant peaks shown as black bars below each sample. The 
/-axis indicates peak height, defined by the maximum number of sequencing tags seen in the highest visible peak in each window. Transcription factor 
consensus sites (TRANSFAC IDs) are indicated above the MBD-seq data in regions of significant peaks. The gene structure is indicated belowthe MBD-seq 
data with CpG islands shown in green. Bisulfite sequencing data confirming the cell-type-specific methylation are shown below. (Black circles) Methylated 
CpG sites; (open circles) unmethylated CpG sites. Each horizontal line indicates a unique clone. 



promoter CpG islands methylated in HSCs, and a smaller percent 
methylated in the more differentiated hematopoietic cell types 
(Fig. 4A). Orphan CpG islands were around 10-fold more likely to 
be methylated in each of the cell types, consistent with the greater 
role in tissue-specific methylation. 

Investigation of cell-type-specific methylation peaks revealed 
that the vast majority of all promoter CpG islands are unmethylated 
in all categories. Increased promoter-associated methylation observed 



in the progenitor category (HSCs + CMPs) compared with com- 
mon and HSC-specific promoters (Fig. 4B). In contrast, the meth- 
ylation of orphan CpG islands varied between peak-type cate- 
gories. The largest percentage of methylated orphan CpG islands 
was among the common methylation peaks (20%) (Fig. 4B). Al- 
though HSC is the most methylated cell type, the percentage of 
HSC-specific methylation peaks in orphan CpG islands was 10-fold 
less than common peaks and around threefold lower than the 
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Table 1. Distribution of methylation peaks in genomic partitions (% in partition) 
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progenitor category (Fig. 4B). These results suggest that the re- 
quirement of DNA methylation for maintenance of HSCs is not 
primarily due to silencing of genes via methylated CpG islands. 

DNA methylation in expressed genes varies 
by genomic partition 

To investigate the role that DNA methylation plays in gene ex- 
pression, we compared the genomic distribution of DNA methyl- 
ation peaks to gene expression data obtained from BloodExpress, 
a database containing gene expression profiles from a large num- 
ber of mouse hematopoietic cell types (Miranda-Saavedra et al. 
2009). Based on gene expression profiles for mouse HSC, CMP, and 
ERY cell types, the greatest number of genes is expressed in the 
HSCs (8594), followed by CMPs (7122), and the fewest number of 
expressed genes are seen in ERYs (4223). Peaks assigned to the 
upstream 10 kb, proximal promoter, RefSeq gene boundary, and 
downstream 10 kb were compared in expressed genes. Consistent 
with the observations of Hodges et al. (2011), the majority of 
expressed genes lacked methylation peaks in the proximal pro- 
moter, with a similar trend seen in the more upstream 10 kb 
(Fig. 5A,B). In contrast, the majority of expressed genes in HSCs 
and CMPs contained methylation peaks within the gene body 
(RefSeq) (Fig. 5 A). Methylation peaks were also investigated in 
genes that were not expressed in each cell type. Consistent with 
the larger absolute number of nonexpressed genes, the number of 
methylated nonexpressed genes exceeds the number of methyl- 
ated expressed genes for each partition (Supplemental Fig. S7). 
While the ratio of unexpressed methylated genes to expressed 
methylated genes is similar between the upstream, downstream, 
and RefSeq partitions, this ratio is elevated in the proximal pro- 
moter for each cell type and markedly elevated in ERYs. These data 
further support the role that proximal promoter methylation plays 
in gene silencing. 

On average, genes had multiple methylation peaks, with the 
RefSeq partition specifically averaging 3. 75 peaks per gene in HSCs, 
2.7 peaks in CMPs, and 2.1 peaks in ERYs. We investigated the 
association of methylation peaks in multiple genomic partitions 
simultaneously to ascertain if specific patterns were associated 
with positive gene expression. Figure 5C shows the percentage of 
expressed genes with methylation peaks in two or more of the four 
partitions: upstream 10 kb (UP), proximal promoter (PP), RefSeq 
gene boundaries (CDS), and 10 kb downstream (DS). Consistent 
with the single partition analysis, expressed genes are unlikely to 
have methylation in the proximal promoter in combination with 
any other genomic region. Of the two-partition correlations, 
methylation most frequently occurred together in the gene bodies 
(CDS) and the 10 kb downstream (DS) partitions, in each of the 
cell types, with the minimal two-partition correlation occurring in 
the proximal promoter and the downstream partitions (Fig. 5C). 



Further analysis of genes with methylation in three or more par- 
titions revealed that 10-fold more expressed genes had methyla- 
tion in the gene body and flanking regions than in all four parti- 
tions in each of the three cell types. Therefore, while proximal 
promoter methylation is rare in expressed genes, methylation in the 
gene body and flanking regions outside of the proximal promoter 
occurs often in expressed genes. 

De novo motif discovery reveals differentially methylated 
transcription factor binding site signatures of hematopoietic 
differentiation 

To elucidate developmental programs potentially modified by the 
presence of DNA methylation, we investigated the sequences un- 
derneath the methylation peaks for recurring motifs. De novo 
motif discovery was performed on all data sets and genomic par- 
titions independently. The top 25 overrepresented motifs were 
queried against the TRANSFAC (Matys et al. 2006) transcription 
factor database, and transcription factors with characterized binding 
sites were identified. Figure 6 contains a summary of the tran- 
scription factors with overrepresented motifs occurring at sites of 
DNA methylation within the five genomic partitions (identified 
on the left) in each peak category (shown on the right). The heatmap 
highlights transcription factors that occur in only one peak category 
(red) such as MATH1 in HSC proximal promoter peaks, two cate- 
gories (green), three categories (teal), and transcription factors that 
are overrepresented in almost all data sets (dark blue) such as 
GABPA. 

Comparison of the overrepresented transcription factor 
binding sites revealed that >10% of all overrepresented motifs 
corresponded to ETS transcription factors. Although the ETS 
transcription factor (ELF1, ETS2, FLU, and GABPA) binding 
sites were overrepresented in all five genomic partitions, the 
distribution within cell-type-specific categories was quite dif- 
ferent. Common methylation peaks had overrepresentation of 
all ETS factors in all partitions except in the intergenic partition. 
In contrast, HSC-unique peaks lacked overrepresentation of 
GABPA in the promoter and flanking partitions but exhibited 



Table 2. Peak partition distribution within genes compared with 
CG content 
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Figure 4. Overlap of MBD-seq peaks with CpG islands. (A) MBD-seq 
peaks in the three cell types compared with the complete genomic CpG 
islands classified by lllingworth et al. (2010). The percent of methylated 
promoter CpG islands (blue) or orphan (non-promoter) CpG islands (red) 
is shown for each cell type. (B) Methylated promoter and orphan CpG 
islands in common and cell-type-specific peaks, as a percentage of total 
CpG islands. 



overrepresentation of multiple ETS sites within the intergenic par- 
tition. These complementary patterns of transcription factor motifs 
suggest that DNA methylation may modulate ETS transcription 
factor occupancy during hematopoietic 
development. 

Other transcription factor motifs ^ 
such as the NFAT (nuclear factor of acti- 
vated T-cells) were highly specific to single 
partitions or cell types. The NFAT1 -binding 
site is overrepresented in CMP-unique 
methylation and myeloid-specific peaks 
located in the proximal promoter. The 
NFAT2 motif is overrepresented in CMP- 
unique methylation peaks located in the 
proximal promoter and downstream re- 
gions, and in the distal promoter of my- 
eloid peaks. In contrast, the NFAT3 motif 
is overrepresented in common and pro- 
genitor methylation peaks located in the 
proximal promoter. The striking differ- 
ence between overrepresented motifs in 
cell-type-specific methylation profiles 
demonstrates that hematopoietic line- 
ages contain unique signatures of methyl- 
ation patterns that may regulate changes 
that occur during differentiation. 



etic cell line, a multipotent cell that can be differentiated into 
various myeloid lineages (Pinto do 6 et al. 1998). Overlap between 
transcription factor peaks and MBD-seq peaks in CMPs and HSCs 
are shown in Table 3. The genomic coverage of each transcription 
factor data set and methylation peak data set was used to generate 
random data sets to calculate the expected genomic overlaps based 
on chance. A significant underrepresentation of overlap between 
transcription factor sites and methylation peaks occurred in both 
CMPs and HSCs (Table 3). The only transcription factor with 
significant co-occurrence with methylation was FLU in HSC 
methylation peaks. Of the 10 key hematopoietic transcrip- 
tion factors included in the study by Wilson et al. (2010), FLU 
is the only factor with an annotated consensus site that was 
overrepresented in our DNA methylation peaks. 

Chromatin immunoprecipitation of FLU and ELF1 reveals 
co-occupancy of transcription factors and methylation 

To further investigate the relationship between DNA methylation 
and transcription factor binding, we next identified the presence 
of CpG sites in the overrepresented transcription factor motifs. 
Table 4 contains a list of all transcription factors with known 
binding sites that were overrepresented in our analysis of the 
proximal promoter partition. Of the 19 transcription factor motifs 
associated with sequences ovenepresented in methylated proximal 
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Minimal genome-wide co-occupancy 
of hematopoietic transcription factors 
with DNA methylation peaks 

To assess the relationship between ge- 
nome-wide DNA methylation and tran- 
scription factor binding, we compared 
our methylation data with genome-wide 
occupancy data for a set of 10 key hema- 
topoietic transcription factors (Wilson 
et al. 2010). Transcription factor binding 
in Wilson et al. was ascertained in the 
immortalized HPC-7 mouse hematopoi- 
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Figure 5. Methylation in genie partitions of expressed genes. Gene expression was obtained from the 
BloodExpress (Miranda-Saavedra et al. 2009) gene expression database for each cell type and compared 
with MBD-seq peaks. (A) The number of expressed genes (/-axis) in each cell type with methylation 
peaks in the distal promoter, proximal promoter, RefSeq gene body, and downstream region. (B) The 
number of expressed genes (/-axis) lacking methylation peaks in each of the genie partitions for each cell 
type. (C) Combinatorial analysis of methylation peaks occurring in all four genie partitions in expressed 
genes. (UP) Upstream 1 0 kb; (PP) proximal promoter; (CDS) RefSeq gene boundary; (DS) downstream. 
The presence (+) or absence (-) of methylation in the indicated partition is shown below the x-axis. The 
/-axis shows the percentage of expressed genes with each methylation pattern. 
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Figure 6. Heatmap of overre presented methylated transcription factor 
binding motifs. The top 25 overrepresented sequence motifs for each 
peak category and each genomic partition are compiled into a heatmap. 
(Top) Transcription factors (TRANSFAC IDs) with confirmed consensus 
sequences. (Right) Each genomic partition with cell-type categories. (Red 
boxes) Transcription factor binding sites that are unique to one cell- 
type peak in a particular partition. Green boxes are present in two cell-type 
categories, teal boxes represent three categories, and blue boxes indicate 
presence in three or more cell-type specific categories. Gray indicates that 
the transcription factor is not overrepresented in the data set. 



promoters, five contained CpG sites within the consensus rec- 
ognition sequence, and five contained a CpG dinucleotide im- 
mediately adjacent to the binding sequence. Many consensus 
binding sites in our data set lack CpG sites; therefore, it is un- 
clear if methylation directly impacts binding of the transcription 
factors. 

To test the hypothesis that differential methylation directly 
impacts the binding of transcription fac- 
tors revealed by our de novo motif dis- 
covery, we performed chromatin immu- 
noprecipitation of the ETS factors ELF1 
and FLU. Methylation peaks primarily 
localized to promoter regions of 10 dis- 
tinct genes, as well as several genie peaks 
were selected for validation. ELF1 and 
FLU occupancy was compared in these 
regions of differential methylation in 
chromatin from CMPs and ERYs. Figure 
7 A shows the data for the endoglin (Eng) 
locus, where several conserved functional 
elements have been identified (Pimanda 
et al. 2006, 2008). Consistent with pro- 
moter methylation corresponding to 
gene silencing, this gene is unmethylated 
and expressed in CMPs, and uniquely 
methylated and silenced in ERYs, as 
determined by BloodExpress (Miranda- 
Saavedra et al. 2009). Although silenced, 
significantly increased enrichment of both 
FLU and ELF1 was observed in the ERY 
promoter methylation peak (Fig. 7B,C). 
Significant differences in binding between 



differentially methylated sites were observed in two intronic peaks 
of Eng. The methylation peak in the first intron (site 2) does not 
overlap a functional conserved element, while the methylation 
peak in intron 1 1 (site 3) corresponds to a conserved element that 
has not been assayed for function. In both cases, the more meth- 
ylated site was occupied by FLU and ELF1 (Fig. 7B,C). Overall, we 
found that four of six regions with statistically significant differ- 
ences in transcription factor occupancy showed increased binding 
in the methylated cell type, while two of six regions had signifi- 
cantly more binding in the less methylated cell type (Fig. 7B,C; 
Supplemental Fig. S8). We conclude that methylation does not 
prevent the binding of ELF1 or FLU. 

Discussion 

The widespread use of massively parallel sequencing approaches has 
led to breakthroughs in genomics, especially in the area of epige- 
netic control of gene regulation. A variety of approaches have been 
developed to map DNA methylation including whole-genome 
bisulfite sequencing (MethylC-seq), reduced representation bi- 
sulfite sequencing (RRBS), and the enrichment-based sequencing 
methods MeDIP-seq and MBD-seq. MethylC-seq requires more 
sequencing reads (50 X) than enrichment approaches to yield 
a comparable level of genome coverage, but provides single- 
nucleotide resolution (Harris et al. 2010). Reduced representation 
bisulfite sequencing provides an alternative to whole-genome bi- 
sulfite sequencing; however, it provides high-resolution data for 
only a limited portion of the genome (Meissner et al. 2005). The 
enrichment approaches detect methylated DNA fragments genome- 
wide and assume methylation of nearby CpG sites. Comprehensive 
comparison of the genome-wide method of MethylC-seq to the 
MBD-seq and MeDIP-seq enrichment methods revealed a >97% 
concordance rate for methylation calls throughout the genome 
(Harris et al. 2010). We conclude that the MBD-seq is an accurate 
and cost-effective approach to survey genome-wide DNA methyl- 
ation in small numbers of primary cells. In this study, we used 
MBD-seq combined with peak calling algorithms to describe DNA 



Table 3. Overlap between transcription factor occupancy and methylation peaks 
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Table 4. Overrepresented motifs in proximal promoter 
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Where different from TRANSFAC IDs, Mouse Genome Informatics-approved symbols are listed in 
parentheses. 



methylation. Direct comparison of our MBD-seq data with the 
reduced representation bisulfite sequencing (RRBS) data in pri- 
mary mouse erythroblasts demonstrated that our MBD-seq peaks 
are biased toward genomic regions with >30%-40% CpG methyl- 
ation (Supplemental Fig. S6). 

In mouse hematopoietic cells, we found 2%-3% methylated 
promoter CpG islands in all cell types, consistent with MBD-seq 
and MeDIP-seq studies in other mouse and human tissues (Weber 
et al. 2007; Illingworth et al. 2010). Although only a few percent of 
promoter CpG islands are methylated, CpG islands within genes 
are more likely to be methylated in a tissue-specific manner 
(Maunakea et al. 2010), and in our study, we identified that 20% of 
non-promoter CpG islands were methylated in all hematopoietic 
cells. Differential methylation of CpG islands, both promoter and 
non-promoter, was less common in cell-type-specific peaks. This 
observation is in agreement with the study recently published by 
Deaton et al. (2011) in which differential methylation of CpG is- 
lands between differentiated lymphoid cells was much less pro- 
found than the differences observed between lymphoid cells and 
brain cells. Our results are consistent with the hypothesis that 



differential methylation at CpG islands 
is more important for early stages of line- 
age specification than for final cell fate 
determination. 

A comparison of DNA methylation 
in pluripotent embryonic stem cells and 
lineage-committed cells has demon- 
strated a positive association between 
global DNA methylation levels and the 
ability to differentiate into multiple cell 
types (Lister et al. 2009; Laurent et al. 
2010). Similar to what we have observed 
in hematopoietic stem cells, studies of 
embryonic stem cells have shown that 
methylation is not restricted to tran- 
scriptional repression, because many 
expressed genes in stem cells contain 
DNA methylation within the gene body 
(Lister et al. 2009; Laurent et al. 2010). 
Additionally, the correlation between 
DNA methylation and the repressive his- 
tone modification H3K27me3 is not ob- 
served in pluripotent embryonic stem 
cells (Laurent et al. 2010). These findings 
suggest that DNA methylation in stem 
cells is not directly involved in in- 
activation of genes, but may instead be 
a mark enabling genomic plasticity. Our 
genome-wide survey of DNA methylation 
demonstrating high levels of methyla- 
tion in the multipotent hematopoietic 
stem cell with progressive loss of meth- 
ylation during erythroid specification 
supports the positive role for DNA 
methylation in cellular plasticity. 

DNA methylation dramatically de- 
creases as differentiation proceeds from 
hematopoietic stem cells to lineage-re- 
stricted common myeloid progenitor cells 
and the terminally committed erythro- 
blast. These data are consistent with the 
observations of Shearstone et al. (2011), 
who have also demonstrated global demethylation in terminally 
differentiating erythroid cells. These genome-wide profiles 
complement several other recent high-throughput surveys of DNA 
methylation in primary lymphoid (B- and T-) cells (Ji et al. 2010; 
Deaton et al. 2011; Shearstone et al. 2011). Ji et al. (2010) used an 
array-based method to investigate methylation at —4.6 million 
CpG sites in eight distinct hematopoietic cell types including HSC/ 
multipotent progenitor cells, CMPs, lymphoid progenitors, and 
differentiated T-cells. In contrast to the decreasing global DNA 
methylation we observed in differentiating myeloid and erythroid 
cells, both Ji et al. (2010) and Deaton et al. (2011) found little 
difference in the methylation of primitive and mature lymphoid 
cells. We conclude that methylation alterations play a much larger 
role in the differentiation of myeloid and erythroid cells than in 
the differentiation of lymphocytes. This conclusion is supported 
by the observation that mice deficient in Dnmtl have increased 
production of myeloid and erythroid cells (Broske et al. 2009; 
Trowbridge et al. 2009). 

We observed that regions of DNA methylation were 
overrepresented for ETS transcription factor binding sites and that 
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Figure 7. FLU and ELF1 occupancy of methylation peaks in the Eng gene. (A) UCSC Genome Browser 
view of the Eng locus on chromosome 2. Transcription factor consensus sites (TRANSFAC IDs) are in- 
dicated above the MBD-seq data in regions of significant peaks. QPCR data verifying enriched ChIP 
binding of FLU (B) and ELF1 (C) at three sites of methylation peaks throughout Eng. Site 1, an ERY- 
specific peak, is within the proximal promoter; site 2, an ERY-specific intronic peak; and site 3, a pro- 
genitor (HSC and CMP) peak in the 3' coding region. Comparison of binding in CMPs and ERYs revealed 
statistically significant differences of P = 0.004 (site 1, FLU), P = 0.009 (site 1 , ELF1 ), P= 0.021 (site 2, 
ELF1), and P = 0.01 2 (site 3, FLU). 



(Sun et al. 2005), and the ETS transcrip- 
tion factor ELF1 were overrepresented 
in the relatively rare erythroblast-unique 
methylation peaks. We studied two genes 
that are silenced in erythroid cells, Meisl 
and Eng, which have erythroid-specific 
methylation peaks that contain consen- 
sus sites for ELF1, and both of these sites 
are bound by ELF1 in erythroblasts. Al- 
though ELF1 has not yet been reported to 
function as a transcriptional repressor, 
the redundant binding of ETS factors 
(Hollenhorst et al. 2007; Okada et al. 
2011) combined with the recent de- 
scriptions of repressive functions of re- 
lated ETS factors leads us to hypothesize 
that ETS factors may silence critical genes 
for proper erythroid development. 

In summary, we provide a compre- 
hensive whole-genome map of DNA 
methylation that can be easily integrated 
with other occupancy data. The sequences 
subject to differential methylation during 
myeloid development lead us to suggest 
that DNA methylation may offer an addi- 
tional layer of modulation of key hema- 
topoietic transcription factors. This study 
provides insight into the interplay be- 
tween DNA methylation and transcrip- 
tional networks that are involved in he- 
matopoietic differentiation. 



there was less than expected overlap between methylation peaks 
and transcription factor occupancy. The ETS transcription factors 
are required for both the maintenance of hematopoietic stem cells 
and for myeloid development (Yang et al. 2011; Yu et al. 2011). 
While typically viewed as activators of transcription, several recent 
studies have characterized repressive functions of ETS factors 
(Dryden et al. 2012; Lee et al. 2012), suggesting that modulation of 
the binding of these factors may have complex effects on gene 
expression. Our negative correlation between methylation peaks 
and global transcription factor binding suggests that methylation 
negatively impacts the binding of these transcription factors. 
Alternatively, the lack of co-occurrence of methylation and tran- 
scription factor occupancy could be a consequence of transcription 
factors binding primarily in methylation-deficient regions such as 
promoters. Our chromatin immunoprecipitation analysis of FLU 
and ELF1 binding in promoters of multiple genes with differential 
methylation peaks in CMPs and ERYs demonstrated that the more 
methylated cell type often exhibited significantly enriched bind- 
ing, which indicates that neither interpretation is entirely correct. 
Since our methylation peaks span regions of several hundred base 
pairs, we do not know whether the exact nucleotides bound by 
transcription factors are methylated. Despite these limitations, we 
conclude that methylated regions may be important for differen- 
tial regulation of genes by transcription factors. 

While genomic methylation is most reduced in erythroblasts, 
some functionally important sites of methylation are retained, 
including the imprinted loci Snrpn (Supplemental Fig. S2) and H19 
(Shearstone et al. 201 1). Binding sites for transcription factors such 
as CKROX, a factor important for specification of CD4 T-cells 



Methods 

Enrichment of primary mouse cells 

Hematopoietic stem cells and common myeloid progenitor cells 
were harvested from adult female C57BL/6 bone marrow. After 
lysis of red cells in ACK lysing buffer (BioWhittaker), ~10 9 cells 
were subjected to lineage depletion using the following rat anti- 
mouse antibodies: CD8a, CD4, CDllb, Ly-6G/Ly-6C, CD45R, and 
Terll9 (BD Biosciences). Lineage depletion was performed as pre- 
viously described (Nemeth et al. 2003). Lineage-negative cells 
(lin~) were stained with PE anti-mouse Seal (clone D7) and APC 
anti-mouse CD117 (clone 2B8; BD biosciences). Lin~ Sca-1 + c-kit + 
(HSC; l%-2% of lineage depleted cells) and lin" Sca-1" c-kit + 
(CMP; 10% of lineage depleted cells) were sorted on a BD FACS Aria 
instrument. Images from a representative sort are shown in Sup- 
plemental Figure SI. Cells from five sorts (HSC) and two sorts 
(CMP) were combined to generate sufficient cells (~1 x 10 6 to 
2 X 10 6 ) for analysis. Erythroblasts were obtained from C57BL/6 d 
13.5 embryonic fetal livers disrupted into a single-cell suspension 
with a 21-gauge needle and stained with APC anti-mouse Terll9 
and FITC anti-mouse CD 71 antibodies (BD Biosciences). The cells 
were sorted into the five populations described by Zhang et al. 
(2003). Cytospins were stained with May-Grunwald Giemsa to 
demonstrate that the R3 population (CD71 + Terll9 + ) contained 
late basophilic erythroblasts (Supplemental Fig. SI). 

MBD2 enrichment of methylated DNA 

Genomic DNA was isolated from enriched cells with the QIAGEN 
Puregene kit and sonicated to 200- to 400-bp fragments. MBD2 
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enrichment was performed with the Active Motif MethylCollector 
kit. Approximately 1 |xg of sonicated genomic DNA was incubated 
with MBD2-His-conjugated protein and magnetic beads according 
to the manufacturer's protocol. Between four and eight MBD2- 
genomic DNA reactions for each biological replicate were purified 
simultaneously and pooled post-enrichment. After enrichment, 
both the methylated fraction and supernatant fractions were pu- 
rified with QIAGEN DNA purification columns. Quantitative PCR 
amplification of the differentially methylated regions regulating 
the imprinting of Snrpn and Rasgrfl and the unmethylated CpG 
island promoter of Actb was performed with SYBR Green PCR 
master mix (Applied Biosystems), and was used to validate the 
enrichment of methylated DNA using the MBD2-pull-down ap- 
proach (Supplemental Fig. S3). 

Bisulfite sequencing validation 

Genomic DNA from HSCs, CMPs, and ERYs was bisulfite-converted 
with the EZ DNA Methylation-Direct kit (Zymo Research). Briefly, 
500 ng of DNA was converted according to a standard protocol. 
The converted DNA was PCR-amplified with bisulfite primers 
designed using MethPrimer (Li and Dahiya 2002). The primer se- 
quences are listed in Supplemental Table S5. PCR-amplified bi- 
sulfite products were cloned using the TOPO TA Cloning kit 
(Invitrogen) and sequenced. Bisulfite sequences were analyzed 
with the Quantification Tool for Methylation Analysis (Kumaki 
et al. 2008). 

Next-generation sequencing of MBD2-enriched genomic DNA 

Two biological replicates of each enriched cell population and one 
supernatant sample per cell type were submitted for high- 
throughput sequencing analysis. Between 225 and 540 ng of 
MBD2-enriched DNA and 1 |xg of supernatant for each cell type 
were used to construct DNA libraries according to the Illumina 
protocol. The libraries were sequenced on the Illumina Genome 
Analyzer platform, and 36-bp single-end reads were used to 
uniquely identify the MBD2-bound fraction of the mouse genome. 

Mapping and peak calling for MBD2 enrichment 

Sequenced reads were mapped to the mouse genome (UCSC as- 
sembly mm9, NCBI build 37) using the ELAND (AJ Cox, unpubl.) 
short-read alignment program. Peaks for each cell type were called 
using MACS (Zhang et al. 2008) with a P-value threshold of P < 
10~ 5 . Sequenced reads from the matched supernatant were used as 
a control for each cell type. When MACS was unable to build a 
model for a given treatment/control pair, the model restrictions 
were lowered to compensate. Peak calls made on replicates were 
post-processed to generate a confident set. Replicate peaks that 
overlapped by >200 bp were considered for further analysis; at 
this cutoff the majority of overlaps were retained (Supplemental 
Fig.S2). 

The final sets of peaks were partitioned according to the ge- 
nomic segments they occupy based on the RefSeq gene coordinate 
map available for the mm9 genome build. Since DNA methylation 
peak lengths average 800 bp (Supplemental Table S2), peaks often 
span intron and exon segment boundaries. To assess the distribu- 
tion of DNA methylation within the context of more discrete se- 
quences within genes, the mean value of a peak range was used to 
assign it to a partition. 

In silico gene expression analysis 

Gene expression was determined using the BloodExpress database 
(Miranda-Saavedra et al. 2009). Briefly, data sets representing cells 



obtained from similar purification strategies were selected and 
compiled for each cell type. HSC gene expression represented the 
union of the LT-HSC, ST-HSC, and MPP data sets from multiple 
gene expression studies (Akashi et al. 2003; Chambers et al. 2007; 
Mansson et al. 2007; Ficara et al. 2008). CMP gene expression 
represents the compilation of similarly purified c-kit, scal~ pro- 
genitor expressed genes (Akashi et al. 2003; Jankovic et al. 2007) 
and ERY gene expression was determined based from Terll9 + 
normoblasts (Chambers et al. 2007). Full details of the cells used in 
the BloodExpress database are available from http://hscl.cimr.cam. 
ac.uk/bloodexpress/. 

Motif discovery 

The complete set of putative binding sites of length 5-10 were 
enumerated using the WordSeeker approach (Lichtenberg et al. 
2010). Each site was evaluated for its overrepresentation in regard 
to its sequence coverage using Markov chain background models 
(Lichtenberg et al. 2009) of varying order, ranging from 0 to the 
maximum allowed order for a given binding site length (maximum 
order = site length - 2). 

The set of enumerated and scored binding sites was sorted in 
decreasing order based on the overrepresentation score and filtered 
according to a word factor-based maximization [a site y is dis- 
carded if it is completely contained in a site x of equal or longer 
length and larger overrepresentation: x > y if x = uyv and scored) > 
score(y), with u and v being non-empty factors themselves]. The 
top 25 putative binding sites were retained for further analysis. 

The set of predicted binding sites was compared with the set of 
known transcription factor binding sites annotated in TRANSFAC 
(Matys et al. 2006). Overlaps between predicted and known binding 
sites were corrected for overlap to ensure that the TRANSFAC sites 
colocalized to the predicted sites in the input sequences. 

Signature generation 

For each known and putatively involved transcription factor, the 
number of input data sets in which it occurs was computed. Using 
the matrix2png application (Pavlidis and Noble 2003), a heatmap 
was created correlating each of the transcription factors with the 
input data sets characterizing the different stages of differentia- 
tion. The temperature of each match is annotated as the number of 
data sets that share the transcription factor involvement to allow 
highlighting of unique combinations. 

Occupation-methylation correlation analysis 

ChlP-seq localization data obtained by Wilson et al. (2010) were 
overlapped with the methylation peaks. Random peaks of equal 
cardinality and size were generated and used to quantify the 
overlap expected by chance. Based on 1000 bootstrapping itera- 
tions, the tabulated expected average overlap and standard de- 
viation were computed. In congruence with the observed overlap, 
Z-scores were inferred and subsequently used to compute P-values 
using the R toolkit (Ihaka and Gentleman 1996). 

Chromatin immunoprecipitation 

Primary mouse CMP and ERY cells were sorted as described above 
and fixed with 1% formaldehyde for 10 min at room temperature. 
Fixation was quenched with 0.125 M glycine for 10 min at room 
temperature. Cells were washed twice in 1 X PBS containing pro- 
tease inhibitor and frozen at -80°C. Chromatin immunoprecipi- 
tation was performed using the Magna ChIP A/G kit (Millipore). 
Briefly, cells were lysed and sonicated to 200- to 400-bp fragments; 
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1 X 10 6 cells were used for each IP. Antibodies for FLU (rabbit 
polyclonal, Abeam) and ELF1 (C-20) (rabbit polyclonal; Santa Cruz 
Biotechnology) were used for immunoprecipitation according to 
the Magna ChIP A/G kit protocol. Rabbit IgG was used as a negative 
control, and enrichment was determined by quantitative PCR as 
described above. A list of all of the ChIP primers used is available in 
Supplemental Table 6. All reactions were performed in duplicate, 
and IgG pull-down was used for normalization with the ACT 
method. Significant differences in occupancy between cell types 
was determined by a Student's t-test. 

Data access 

The data have been submitted to the NCBI Gene Expression 
Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/geo/) under ac- 
cession no. GSE38354 and are also available on our website at 
http://tracks.msseeker.org. 
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